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Abstract 

The Gap-Hamming-Distance problem arose in the context of proving space lower bounds for a num- 
ber of key problems in the data stream model. In this problem, Alice and Bob have to decide whether 
the Hamming distance between their n-bit input strings is large (i.e., at least n/2 + y/n) or small (i.e., 
at most n/2 — *Jn); they do not care if it is neither large nor small. This ®(*fn) gap in the problem 
specification is crucial for capturing the approximation allowed to a data stream algorithm. 

Thus far, for randomized communication, an Q,(n) lower bound on this problem was known only 
in the one-way setting. We prove an £2(n) lower bound for randomized protocols that use any constant 
number of rounds. 

As a consequence we conclude, for instance, that e-approximately counting the number of distinct 
elements in a data stream requires Q, (1 /e 2 ) space, even with multiple (a constant number of) passes over 
the input stream. This extends earlier one-pass lower bounds, answering a long-standing open question. 
We obtain similar results for approximating the frequency moments and for approximating the empirical 
entropy of a data stream. 

In the process, we also obtain tight n — ®(y/n\ogn) lower and upper bounds on the one-way deter- 
ministic communication complexity of the problem. Finally, we give a simple combinatorial proof of an 
Q(n) lower bound on the one-way randomized communication complexity. 



1 Introduction 



This paper concerns communication complexity, which is a heavily-studied basic computational model, and 
is a powerful abstraction useful for obtaining results in a variety of settings not necessarily involving commu- 
nication. To cite but two examples, communication complexity has been applied to prove lower bounds on 
circuit depth (see, e.g., [KW90]) and on query times for static data structures (see, e.g., JMNSW98, Pat08|). 
The basic setup involves two players, Alice and Bob, each of whom receives an input string. Their goal is to 
compute some function of the two strings, using a protocol that involves exchanging a small number of bits. 
When communication complexity is applied as a lower bound technique — as it often is — one seeks to 
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prove that there does not exist a nontrivial protocol, i.e., one that communicates only a sublinear number of 
bits, for computing the function of interest. Naturally, such a proof is more challenging when the protocol 
is allowed to be randomized and err with some small probability on each input. 

The textbook by Kushilevitz and Nisan [KN97 ] provides detailed coverage of the basics of communica- 
tion complexity, and of a number of applications, including the two mentioned above. In this paper, we only 
recap the most basic notions, in Section [2 

Our focus here is on a specific communication problem — the Gap-Hamming-Distance problem — 
that, to the best of our knowledge, was first formally studied by Indyk and Woodruff [IW03 ] in FOCS 2003. 
They studied the problem in the context of proving space lower bounds for the Distinct Elements problem in 
the data stream model. We shall discuss their application shortly, but let us first define our communication 
problem precisely. 

The Problem. In the Gap-Hamming-Distance problem, Alice receives a Boolean string x e {0, 1}" and 
Bob receives y e {0, 1}". They wish to decide whether x and y are "close" or "far" in the Hamming sense. 
That is, they wish to output Oif A (x, y) < n/2 — *Jn and 1 if A (x , y) > n /2 + *Jn. They do not care about 
the output if neither of these conditions holds. Here, A denotes Hamming distance. In the sequel, we shall 
be interested in a parametrized version of the problem, where the thresholds are set at n/2 ± c*Jn, for some 
parameter c e R + . 

Our Results. While we prove a number of results about the Gap-Hamming-Distance problem here, there 
is a clear "main theorem" that we wish to highlight. Technical terms appearing below are defined precisely 
in Section [2] 

Theorem 1 (Main Theorem, Informal). Suppose a randomized j-error protocol solves the Gap-Hamming- 
Distance problem using k rounds of communication. Then, at least one message must be n/2 ^ bits 
long. In particular, any protocol using a constant number of rounds must communicate Q(n) bits in some 
round. In fact, these bounds apply to deterministic protocols with low distributional error under the uniform 
distribution. 

Notice that our lower bound applies to the maximum message length, not just the total length. 

At the heart of our proof is a round elimination lemma that lets us "eliminate" the first round of com- 
munication, in a protocol for the Gap-Hamming-Distance problem, and thus derive a shorter protocol for an 
"easier" instance of the same problem. By repeatedly applying this lemma, we eventually eliminate all of 
the communication. We also make the problem instances progressively easier, but, if the original protocol 
was short enough, at the end we are still left with a nontrivial problem. The resulting contradiction lower 
bounds the length of the original protocol. We note that this underlying "round elimination philosophy" is 
behind a number of key results in communication complexity HMNS W98[ ISen03[ ICR041 IADHP061 ICha071 
IVW071ICJP081 . 

Besides the above theorem, we also prove tight lower and upper bounds of n — © (^/n log n) on the one- 
way deterministic communication complexity of Gap-Hamming-Distance. Only Q.(n) lower bounds were 
known before. We also prove an Q(n) one-way randomized communication lower bound. This matches 
earlier results, but our proof has the advantage of being purely combinatorial. (We recently learned that 
Woodruff [Woo09] had independently discovered a similar combinatorial proof. We present our proof nev- 
ertheless, for pedagogical value, as it can be seen as a generalization of our deterministic lower bound 
proof.) 

Motivation and Relation to Prior Work. We now describe the original motivation for studying the Gap- 
Hamming-Distance problem. Later, we discuss the consequences of our Theorem Q] In the data stream 
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model, one wishes to compute a real-valued function of a massively long input sequence (the data stream) 
using very limited space, hopefully sublinear in the input length. To get interesting results, one almost 
always needs to allow randomized approximate algorithms. A key problem in this model, that has seen 
much research HFM851 IAMS991 lBJK+041 IIW031 IWoo091 . is the Distinct Elements problem: the goal is to 
estimate the number of distinct elements in a stream of m elements (for simplicity, assume that the elements 
are drawn from the universe [m] := {1, 2, . . . , m}). 

An interesting solution to this problem would give an nontrivial tradeoff between the quality of approx- 
imation desired as the space required to achieve it. The best such result [BJ K + 04t achieved a multiplicative 
(1 + ^-approximation using space 0(l/s 2 ), where the O-notation suppresses logm and log(l/£) factors. 
It also processed the input stream in a single pass, a very desirable property. Soon afterwards, Indyk and 
Woodruff [IW03 ] gave a matching £1(1 /s 2 ) space lower bound for one-pass algorithms for this problem, by 
a reduction from the Gap-Hamming-Distance communication problem. In SODA 2004, Woodruff [Woo04] 
improved the bound, extending it to the full possible range of subconstant e, and also applied it to the more 
general problem of estimating frequency moments F p := X/Li fl '» wnere f is th e frequency of element i 
in the input stream. A number of other natural data stream problems have similar space lower bounds via 
reductions from Gap-Hamming, a more recent example being the computation of the empirical entropy of a 
stream HCCM07H . 

The idea behind the reduction is quite simple: Alice and Bob can convert their Gap-Hamming inputs 
into suitable streams of integers, and then simulate a one-pass streaming algorithm using a single round of 
communication in which Alice sends Bob the memory contents of the algorithm after processing her stream. 
In this way, anQ(n) one-way communication lower bound translates into an Cl(l/s 2 ) one-pass space lower 
bound. Much less simple was the proof of the communication lower bound itself. Woodruff's proof [ Woo04 ] 
required intricate combinatorial arguments and a fair amount of complex calculations. Jayram et al. IIJKS07II 
later provided a rather different proof, based on a simple geometric argument, coupled with a clever reduc- 
tion from the INDEX problem. A version of this proof is given in Woodruff's Ph.D. thesis |Woo07l . In 
Section [2 we provide a still simpler direct combinatorial proof, essentially from first principles. 

All of this left open the tantalizing possibility that a second pass over the input stream could drastically 
reduce the space required to approximate the number of distinct elements — or, more generally, the fre- 
quency moments F p . Perhaps 0(l/s) space was possible? This was a long-standing open problem [ Kum06 ] 
in data streams. Yet, some thought about the underlying Gap-Hamming communication problem suggested 
that the linear lower bound ought to hold for general communication protocols, not just for one-way com- 
munication. This prompted the following natural conjecture. 

Conjecture 2. A —error randomized communication protocol for the Gap-Hamming-Distance problem 
must communicate Q(n ) bits in total, irrespective of the number of rounds of communication. 

An immediate consequence of the above conjecture is that a second pass does not help beat the Q (1 /s 2 ) 
space lower bound for the aforementioned streaming problems; in fact, no constant number of passes helps. 
Our Theorem Q] does not resolve Conjecture|2] However, it does imply the Q(l/£ 2 ) space lower bound with 
a constant number of passes. This is because we do obtain a linear communication lower bound with a 
constant number of rounds. 

Finer Points. To better understand our contribution here, it is worth considering some finer points of pre- 
viously known lower bounds on Gap-Hamming-Distance, including some "folklore" results. The earlier 
one-way bounds were inherently one-way, because the INDEX problem has a trivial two-round pro- 
tocol. Also, the nature of the reduction implied a distributional error lower bound for Gap-Hamming only 
under a somewhat artificial input distribution. Our bounds here, including our one-way randomized bound, 
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overcome this problem, as does the recent one-way bound of Woodruff [Woo09 |: they apply to the uniform 
distribution. As noted by Woodruff [Woo09], this has the desirable consequence of implying space lower 
bounds for the Distinct Elements problem under weaker assumptions about the input stream: it could be 
random, rather than adversarial. 

Intuitively, the uniform distribution is the hard case for the Gap-Hamming problem. The Hamming dis- 
tance between two uniformly distributed /7-bit strings is likely to be just around the n/2±® (y/n) thresholds, 
which means that a protocol will have to work hard to determine which threshold the input is at. Indeed, this 
line of thinking suggests an Q.(n) lower bound for distributional complexity — under the uniform distribu- 
tion — on the gapless version of the problem. Our proofs here confirm this intuition, at least for a constant 
number of rounds. 

It is relatively easy to obtain an Q.{n) lower bound on the deterministic multi-round communication 
complexity of the problem. One can directly demonstrate that the communication matrix contains no large 
monochromatic rectangles (see, e.g. MWoo071D . Indeed, the argument goes through even with gaps of the 
form n/2 ± 0(w), rather than n/2 ± &(^/n). It is also easy to obtain an Q(«) bound on the randomized 
complexity of the gapless problem, via a reduction from DISJOINTNESS. Unfortunately, the known hard 
distributions for DISJOINTNESS are far from uniform, and DISJOINTNESS is actually very easy under a 
uniform input distribution. So, this reduction does not give us the results we want. 

Furthermore, straightforward rectangle-based methods (discrepancy /corruption) fail to effectively lower 
bound the randomized communication complexity of our problem. This is because there do exist very large 
near-monochromatic rectangles in its communication matrix. This can be seen, e.g., by considering all 
inputs (x, y) with x, = y ; = for i e [ra/100]. 

Connection to Decision Trees and Quantum Communication. We would like to bring up two other 
illuminating observations. Consider the following query complexity problem: the input is a string x e 
{0, 1}" and the desired output is 1 if \x\ > n/2 + *Jn and if \x\ < n/2 — «Jn. Here, \x\ denotes the 
Hamming weight of x. The model is a randomized decision tree whose nodes query individual bits of x, 
and whose leaves give outputs in {0, 1}. It is not hard to show that Q(«) queries are needed to solve this 
problem with | error. Essentially, one can do no better than sampling bits of x at random, and then Q (1 /s 2 ) 
samples are necessary to distinguish a biased coin that shows heads with probability j + s from one that 
shows heads with probability i — e. 

The Gap-Hamming-Distance problem can be seen as a generalization of this problem to the communica- 
tion setting. Certainly, any efficient decision tree for the query problem implies a correspondingly efficient 
communication protocol, with Alice acting as the querier and Bob acting as the responder (say). Conjec- 
ture |2] says that no better communication protocols are possible for this problem. 

This query complexity connection brings up another crucial point. The quantum query complexity of the 
above problem can be shown to be 0(^/n), by the results of Nayak and Wu IINW99B . This in turn implies an 
0(^/n log n) quantum communication protocol for Gap-Hamming, essentially by carefully "implementing" 
the quantum query algorithm, as in Razborov [Raz02]. Therefore, any technique that seeks to prove an 
Q.(n) lower bound for Gap-Hamming (under classical communication) must necessarily fail for quantum 
protocols. This rules out several recently-developed methods, such as the factorization norms method of 
Linial and Shraibman MLS071 and the pattern matrix method of Sherstov [She08 |. 

Connections to Recent Work. Our multi -round Q(«) bound turns out to also have applications [A BC091 
to the communication complexity of several distributed "functional monitoring" problems, studied recently 
by Cormode et al. HCMY081 in SODA 2008. Also, our lower bound approach here uses and extends 
a subspace-finding technique recently developed by Brody MBro091 to prove lower bounds on multiparty 
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pointer jumping. 



2 Basic Definitions, Notation and Preliminaries 

We begin with definitions of our central problem of interest, and quickly recall some standard definitions 
from communication complexity. Along the way, we also introduce some notation that we use in the rest of 
the paper. 

Definition 1. For strings ije {0, 1}", the Hamming distance between x and y, denoted A(x, y), is defined 
as the number of coordinates i e [n] such that x,- ^ y h 

Definition 2 (Gap-Hamming-Distance problem). Suppose n e N and c e M + . The c-Gap-Hamming- 
Distance partial function, on «-bit inputs, is denoted GHD c- „ and is defined as follows. 



We also use GHD C; „ to denote the corresponding communication problem where Alice holds x e {0, 1}", 
Bob holds y e {0, 1}", and the goal is for them to communicate and agree on an output bit that matches 
GHD c „(x, y). By convention, * matches both and 1. 

Protocols. Consider a communication problem / : {0, 1}" x {0, 1}" — > {0, 1,*}" and a protocol V that 
attempts to solve /. We write V(x, y) to denote the output of V on input (x, y): note that this may be a 
random variable, dependent on the internal coin tosses of V, if V is a randomized protocol. A deterministic 
protocol V is said to be correct for / if V (x, y) : V(x, y) = fix, y) (the "=" is to be read as "matches"). 
It is said to have distributional error s under an input distribution p if PT( x ,y)~p[V(x, y) ^ fix, y)] < s. A 
randomized protocol V, using a public random string r, is said to be have error e if V (x, y) : Pr r [V(x, y) 
fix, y)] < s. A protocol V is said to be a k-round protocol if it involves exactly k messages, with Alice and 
Bob taking turns to send the messages; by convention, we usually assume that Alice sends the first message 
and the recipient of the last message announces the output. A 1 -round protocol is also called a one-way 
protocol, since the entire communication happens in the Alice — > Bob direction. 

Communication Complexity. The deterministic communication complexity Dif) of a communication 
problem / is defined to be the minimum, over deterministic protocols V for / ', of the number of bits ex- 
changed by V for a worst-case input (x, y). By suitably varying the class of protocols over which the 
minimum is taken, we obtain, e.g., the e-error randomized, one-way deterministic, £-error one-way random- 
ized, and e-error p -distributional deterministic communication complexities of /, denoted R £ if), if), 
Rj* if), and D p ^if), respectively. When the error parameter s is dropped, it is tacitly assumed to be |; as 
is well-known, the precise value of this constant is immaterial for asymptotic bounds. 

Definition 3 (Near-Orthogonality). We say that strings x,y e {0, 1}" are c-near-orthogonal, and write 
x -L c y, if | A(x, y) —n/2\ < c*/n. Here, c is a positive real quantity, possibly dependent on n. Notice that 

GHD c ,„(x, y) = * <=> x ± c y. 

The distribution of the Hamming distance between two uniform random /7-bit strings — equivalently, 
the distribution of the Hamming weight of a uniform random n-bit string — is just an unbiased binomial 
distribution Binom(«, |). We shall use the following (fairly loose) bounds on the tail of this distribution 



GHD c> „(x,y) = 



1, 
0, 



if A(x, y) > n/2 + c^/n , 
if A(x, y) < n/2 — c«/n , 
otherwise. 



(see, e.g., Feller |Fel68 ]). 
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Fact 3. Let T n (c) = Pr v [x JL C 0"], where x is distributed uniformly at random in {0, 1}". Let T(c) = 
lim^oo T n (c). Then 

-2c 2 

2 -3c 2 -2 < t(c) f* —= < 2~ c2 . 

There are two very natural input distributions for GHD C „: the uniform distribution on {0, 1}" x {0, 1}", 
and the (non-product) distribution that is uniform over all inputs for which the output is precisely defined. 
We call this latter distribution ju c>n . 

Definition 4 (Distributions). For n e N, c e M + , let ju c ^ n denote the uniform distribution on the set 
{(x, y) g {0, 1}" x {0, 1}" : x JLc y}- Also, let U„ denote the uniform distribution on {0, 1}". 

Using Fact [3l we can show that for a constant c and suitably small e, the distributional complexities 
Du„xU„,e(GHD c ^ n ) and D Mc n S (GHD c „) are within constant factors of each other. This lets us work with the 
latter and draw conclusions about the former. The latter has the advantage that it is meaningful for any 
s < \, whereas the former is only meaningful if £ < jT(c). 

Let B(x, r) denote the Hamming ball of radius r centered at x. We need use the following bounds on 
the volume (i.e., size) of a Hamming ball. Here, H : [0, 1] — > [0, 1] is the binary entropy function. 

Fact 4. Ifr = c^/n, then {^/n/c) r < \B(x, r)\ < n r . 

Fact 5. Ifr = an for some constant < a < 1, then \B{x,r)\ < 2 nH(o) . 

3 Main Theorem: Multi-Round Lower Bound 
3.1 Some Basics 

In order to prove our multi-round lower bound, we need a simple — yet, powerful — combinatorial lemma, 
known as Sauer's Lemma liSau72B . For this, we recall the concept of Vapnik-Chervonenkis dimension. Let 
S Q {0, 1}" and / C [«]. We say that 5 shatters / if the set obtained by restricting the vectors in S to the 
coordinates in / has the maximum possible size, 2' 7 '. We define VC-dim(5) to be the maximum |/| such 
that S shatters / . 

Lemma 6 (Sauer's Lemma). Suppose S C {0, 1}" has VC-dim(»S) < d. Then 

'"si®- 

;=o x ' 

When d = an for some constant a, then the above sum can be upper bounded by 2" H< - a \ This yields the 
following corollary. 

Corollary 7. If\S\ > 2" H{a) , for a constant a, then VC-dim(,S) > an. 

We now turn to the proof proper. It is based on a round elimination lemma that serves to eliminate the 
first round of communication of a GHD protocol, yielding a shorter protocol, but for GHD instances with 
weakened parameters. To keep track of all relevant parameters, we introduce the following notation. 

Definition 5. A [k, n, s, c, £]-protocol is a deterministic k -round protocol for GHD C „ that errs on at most an 
s fraction of inputs, under the input distribution /x Cyll , and in which each message is s bits long. 
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The next lemma gives us the "end point" of our round elimination argument. 

Lemma 8. There exists no [0, n, s, c, s]-protocol with n > 1, c = o(«/n), and s < »■ 

Proof. With these parameters, p. c n has nonempty support. This implies Pr„ c n [GHD f „(x, y) = 0] = 
Pi> c „[GHD Ci ,,(i, y) — 1] = 5. Thus, a 0-round deterministic protocol, which must have constant output, 
cannot achieve error less than ~. □ 



"o 


= n, 




f i 


= rii/3, 


so 


= t Q s , 




i-i 


= tSj , 




= 10, 


Cj- 


i-i 


= 2a , 


so 


= 2" 2 '" , 


e,- 


f i 


= Bt/n 



3.2 The Round Elimination Lemma 

The next lemma is the heart of our proof. To set up its parameters, we set to = (48 In 2) • 2 nk , t = 2 l5k , and 
b = r _1 (l/8), and we define a sequence {(«,, s,-, c, , £/))f =0 as follows: 



for < i < k . (1) 



Lemma 9 (Round Elimination for GHD). Suppose < i < k and s, < n, /20. Suppose there exists a 
[k — i, n t , Si, Ci, Sj]-protocol. Then there exists a [k — i — 1, n i+l , s i+ \, q+i, e i+ \]-protocol. 

Proof. Let (n,s, c, s) = (n h s h c,-, e f ) and («', j', c', e') = (n i+l , s i+l , c i+u s i+l ). Also, let = fi c>n , 
ju' = fj, c >,n', GHD = GHD Ci „ and GHD' = GHD C < „/. Let P be a [k — i, n, s, c, £]-protocol. Assume, WLOG, 
that Alice sends the first message in V. 
Call a string x Q e {0, 1}" "good" if 

Pr [V(x, y) ^ GHD(x, y) \ x — x ] < 2s. (2) 

(x,y)~/i 

By the error guarantee of V and Markov's inequality, the number of good strings is at least 2"~ l . There are 
2 s < 2" /20 different choices for Alice's first message. Therefore, there is a set M C {0, 1}" of good strings 
such that Alice sends the same first message m on every input x e M, with \M\ > 2"~ i ~"/ 20 > 2" //(1/3) . By 
Corollary|71 VC-dim(M) > n/3. Therefore, there exists a set / C [«], with |/| = n/3 = n', that is shattered 
by M. For strings x' e {0, 1}" and x" e {0, l}"~" , we write x' o x" to denote the string in {0, 1}" formed 
by plugging in the bits of x' and x" (in order) into the coordinates in / and [n] \ I, respectively. 

We now give a suitable (k — i — 1) -round protocol Q for GHD', in which Bob sends the first message. 
Consider an input (x', y') e {0, 1}" x {0, 1}" , with Alice holding x' and Bob holding y' . By definition of 
shattering, there exists an x" e {0, l}"~" such that x := x' o x" e M. Alice and Bob agree beforehand on 
a suitable x for each possible x'. Suppose Bob were to pick a uniform random y" e {0, 1}"~" and form the 
string y := y' o y" . Then, Alice and Bob could simulate V on input (x, y) using only k — i — 1 rounds 
of communication, with Bob starting, because Alice's first message in V would always be m. Call this 
randomized protocol Q\. We define Q to be the protocol obtained by running t instances of Qi in parallel, 
using independent random choices of y", and outputting the majority answer. Note that the length of each 
message in Q is t s — s'. We shall now analyze the error. 

Suppose x" ± b y". Let d { = A(x,y) - n/2, d 2 = A(x', y') - n'/2 and d 3 = A(x", y") - (n - n')/2. 
Clearly, d\ = d 2 + dj. Also, 

, , — , (c' — b\fi)Jn r - 

\d\\ > \di\-\fa\ > c'yfn' -b^Jn - n' > _ > c^H, 

V3 
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where we used £T|) and our choice of b. Thus, x JL C y. The same calculation also shows that d\ and dj, have 
the same sign, as \d 3 \ > \d 2 \. Therefore GHD(x, y) = GHD'(x', y'). 

For the rest of the calculations in this proof, fix an input x' for Alice, and hence, x" and x as well. For 
a fixed y' , let £(y') denote the event that V(x, y) ^ GHD(x, y): note that y" remains random. Using the 
above observation (at step (O below), we can bound the probability that Q\ errs on input (x', y') as follows. 

Pr [Qi(x', y') # GHD'(x', /) | y'] < Pr [V(x, y) # GHD(x, y) v ghd(x, y) # GHD'(x', /) | y'] 

y y 

< Pr [S(y')] + Pr [ghd(x, y) # GHD'(x', /) | /] 

< Pr[f(y')]+Pr[x"/ i y'] (3) 

y" y" 

< Pr[£(y')] + T(b) 

= Pr [£(/)] + 1/8, (4) 

where step ((U) follows from our choice of b. To analyze Q, notice that during the ?-fold parallel repetition 
of Si, y' remains fixed while y" varies. Thus, it suffices to understand how the repetition drives down the 
sum on the right side of ©. Unfortunately, for some values of y', the sum may exceed A in which case it 
will be driven up, not down, by the repetition. To account for this, we shall bound the expectation of the first 
term of that sum, for a random y' . 

To do so, let z ~ n | x be a random string independent of y. Notice that z is uniformly distributed on 
a subset of {0, 1}" of size 2T(c), whereas y is uniformly distributed on a subset of {0, 1}" of size 2"T(c'). 
(We are now thinking of x as being fixed and both y' and y" as being random.) Therefore, 



Pr [£(/)] 



= Pr [£(/)] = Pr[P(x,30^GHD(x,3;)] 

v y 

< Pr[T(x,z)^GHD(x,z)]-T(c)/T(c / ) 

< 2sT(c)/T(c'), (5) 
where ([5]> holds because x, being good, satisfies (f2]). Thus, by Markov's inequality, 



Pr 

y 



< 16sT(c)/T(c') . (6) 



If, for a particular y' , the bad event Pr y »[£(y')] > „ does «of occur, then the right side of dU) is at most 
1/8 + 1/8 = 1/4. In other words, Qi errs with probability at most 1/4 for this y' . By standard Chernoff 
bounds, the f-fold repetition in Q drives this error down to (e/4)'^ 4 < 2~^ w < £q < £■ Combining this 
with ©, which bounds the probability of the bad event, we get 

Pr [QCx',/) #GHD'(x',/)] < 16sT(c)/T(c') + s < s/T(c') = e' , 

where r denotes the internal random string of Q (i.e., the collection of y"s used). 

Note that this error bound holds for every fixed x', and thus, when (x', y') ~ fi'. Therefore, we can fix 
Bob's random coin tosses in Q to get the desired [k — i — 1, n' , s' , c', e'J-protocol. □ 
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3.3 The Lower Bound 



Having established our round elimination lemma, we obtain our lower bound in a straightforward fashion. 

Theorem 10 (Multi-round Lower Bound). Let V be a k-round —error randomized communication proto- 
col for GHD C „, with c = 0(1), in which each message is s bits long. Then 

n 

s > 



20(k 2 ) ■ 

Remark. This is a formal restatement of Theorem [T] 

Proof. For simplicity, assume c < c — 10. Our proof easily applies to a general c = 0(1) by a suitable 
modification of the parameters in CD). Also, assume n > 2 Ak , for otherwise there is nothing to prove. 

By repeating V (48 In 2) • 2 = t times, in parallel, and outputting the majority of the answers, we can 
reduce the error to 2~ 2 = £ - The size of each message is now t s = s . Fixing the random coins of the 
resulting protocol gives us a [k, n , s , Co, £ ]-protocol Vo. 

Suppose Si < «,/20 for all i, with < i < k. We then repeatedly apply Lemma ^\k times, starting 
with Vq. Eventually, we end up with a [0, n k , s k , c k , £jt] -protocol. Examining £T|), we see that n k = n/3 k , 
s k = 2 l5k2 s = (481n2)2 15 * 2+11 ^, and c k = 10 • 2 k . Notice that n k > 2 Ak2 /3 k > 1 and c k = o(Jh~ k ). We 
also see that (c, )f =1 is an increasing sequence, whence e,+i/e; = l/T(c i+1 ) < l/T(c k ) < 2 3ck + 2 , where 
the final step uses Fact[3j Thus, 

s k < e (2 3c * +2 )* : = 2~ 2<<k ■ 2 (3(l0 ' 2l!)2+2yk = 2~ 2ni!+300k ' 22k+2k < - 

In other words, we have a [0, n k , s k , c k , ^J-protocol with n k > 1, c k = o(*Jn k ) and e k < ^. This contradicts 
Lemma [8j 

Therefore, there must exist an i such that Sj > n,/20. Since <^, >f =1 is increasing and (n,-)* =1 is decreas- 
ing, s k > n k /20. By the above calculations, (48 In 2)2 15k2+nk s > n/(20 ■ 3 k ), which implies s > n/2 0{k2 \ 
as claimed. □ 

Notice that, for constant k, the argument in the above proof in fact implies a lower bound for deter- 
ministic protocols with small enough constant distributional error under fi c n . This, in turn, extends to 
distributional error under the uniform distribution, as remarked earlier. 



4 Tight Deterministic One-Way Bounds 

The main result of this section is the following. 

Theorem 11. D^(GHD C „) = n — ®(*fnlogn) for all constant c. 

Definition 6. Let x\, x 2 , y e {0, 1}". We say that y witnesses x\ and x 2 or that y is a witness for (jci, x 2 ) if 
x\ JL C y,x 2 JL C y, and GHD Cj „(xi, y) + GHD Cj „(x 2 , y). 

Intuitively, if {x\, x 2 ) have a witness, then they cannot be in the same message set. For if Alice sent the 
same message on x\ and x 2 and Bob's input y was a witness for (xi, x%) then whatever Bob were to output, 
the protocol would err on either (jci, y) or (x 2 , y)- The next lemma characterizes which (jci, x 2 ) pairs have 
witnesses. 
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Lemma 12. For all x\ , x 2 e {0, 1}", there exists y that witnesses {x\, x 2 ) if and only if A {x\ , x 2 ) > Ic^fn. 

Proof. On the one hand, suppose y witnesses (x l ,x 2 ). Then assume WLOG that A(jci, y) < n/2 — c^fn and 
A(x 2 , y) - n /2 + c-s/n. By the triangle inequality, A(x\, x 2 ) > A(x 2 , y) — A(xi, y) — 2c*Jn. Conversely, 
suppose A(xi,x 2 ) > 2c«Jn. Let L = [i : X\[i] — x 2 [i]}, and let R = {i : X\[i] x 2 [i]}. Suppose 
y agrees with x\ on all coordinates from R and half the coordinates from L. Then, A(jq, y) = \L\/2 = 
(n — A(xi, x 2 ))/2 < n/2 — c^fn. Furthermore, y agrees with x 2 on no coordinates from R and half the 
coordinates from L, so A(x\, y) = \L\/2 + \R\ > n/2 + c*fn. □ 

We show that it is both necessary and sufficient for Alice to send different messages on x\ and x 2 
whenever A(xi,x 2 ) is "large". To prove this, we need the following theorem, due to Bezrukov [Bez87] 
and a claim that is easily proved using the probabilistic method (a full proof of the claim appears in the 
appendix). 

Theorem 13. Call a subset A C {0, 1}" d-maximal if it is largest, subject to the constraint that A(x, y) < d 
for all x, y e A. 

1. If d = 2t then B(x, t) is d-maximal for any x e {0, 1}". 

2. If d = 2t + 1 then B(x, t) U B(y, t) is d-maximal for any ije {0, 1}" such that A(x, y) = 1. □ 

Claim 14. It is possible to cover {0, 1}" with at most 2"~°^ n log ") Hamming balls, each of radius c^fn. □ 

Proof of Theorem 1771 For the lower bound, suppose for the sake of contradiction that there is a protocol 
where Alice sends only n — c^fnlogn bits. By the pigeonhole principle, there exists a set M C {0, 1}" 
of inputs of size \M\ > 2" /2 n ~ c ^" logn = 2 c ^" los " = n c ^" upon which Alice sends the same message. By 
TheoremfOl the Hamming ball B(x, c^fn) is 2c -fn -maximal, and by Fact|U \B(x, Cy/n)\ < \M\. Therefore, 
there must be x\, x 2 e M with A(xi, x 2 ) > 2c*Jn. By Lemma [T2l there exists a y that witnesses {x.\,x 2 ). 
No matter what Bob outputs, the protocol errs on either (xj, y) or on (x 2 , y). 

For a matching upper bound, Alice and Bob fix a covering C = {B(xq, r)} of {0, 1}" by Hamming balls 
of radius r = c^/n. On input x, Alice sends Bob the Hamming ball B(xq, r) containing x. Bob selects 
some x' g B(x , r) such that x' JL C y and outputs GHD(V, y). The correctness of this protocol follows 
from Lemma [T2l as A(x, x') < 2c*Jn since they are both in B(x , c^fn). The cost of the protocol is given 
by Claim [141 which shows that it suffices for Alice to send log (2"~° (, /" log ' ,) ) = n — 0{^fn\o%n) bits to 
describe each Hamming ball. □ 

5 One Round Randomized Lower Bound 

Next, we develop a one-way lower bound for randomized protocols. Note that our lower bound applies 
to the uniform distribution, which, as mentioned in Section [T] implies space lower bounds for the Distinct 
Elements problem under weaker assumptions about the input stream. Woodruff MWoo09B recently proved 
similar results, also for the uniform distribution. We include our lower bound as a natural extension of the 
deterministic bound. 

Theorem 15. /?^(ghd c „) = Q(n). 

Proof. For the sake of clarity, fix c = 2 and e = 1/10, and suppose V is a one-round, £-error, o(n)-bit 
protocol for GHD C „ . 
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Definition 7. For x e {0, 1}", let Y x := {y: x JL 2 y}. Say that x is good if Pr^ [P(jc, y) = GHD(x, y)] < 
2e. Otherwise, call te<i. 

By Markov's inequality, at most a 1 /2-fraction of x are bad. Next, fix Alice's message m to maximize 
the number of good x, and let M = {x e {0, 1}" : x is good and Alice sends m on input x}. It follows that 

|M| > 2"- 1 /2 o( " ) > 2" (1 -° (1)) . 

Our goal is to show that since \M\ is large, we must err on a > 2e-fraction of y e Y x for some x e M, 
contradicting the goodness of jc. Note that it suffices to show that a 4s fraction of y e Y X{ witness X\ and x 2 . 

\M\ > 2" (1 -° (1)) , so by Fact[5]and Theorem[[3l There exist x u x 2 with A(x u x 2 ) > 1 - o(l). Next, 
we'd like to determine the probability that a random y e Y Xi witnesses (xi, x 2 ). Without loss of generality, 
let x\ = 0". Let w(x) := Pr ye y [GHD(^, y) ^ GHD(xi, y)]. The following lemma shows that w(x) is an 
increasing function of We leave the proof until the appendix. 

Lemma 16. For all x,x' e {0, 1}", w(x) > w(x') <=> \x\ > \x'\, with equality if and only if\x\ = \x'\. 
We compute w(x) by conditioning on |y|: 

w{x) = J] Pr [A(x, y) > n/2 + Cy[h~\ \y\ = n x ] ■ Pr[|y| = m] . 

«1 <nj1—c-Jn 

Fix | jc | — : m, pick a random y with \y\ —n\, and suppose there are k coordinates i such that X[ = y t . 
Then, A(x, y) = (m — k) + (n\ — k) = m + n\ — 2k. Hence, 

_ m + ni n c _ 

A(x, y) > n/2 + cv« <=> ^ < v« • 

' 2 42 

Note that given a random y with weight |y| = ni, the probability that exactly & of m coordinates have 
Xj — yi = 1 follows the hypergeometric distribution Hyp(&; «, m, n\). Therefore, we can express the 
probability Pr| V | =n , [A(x, y) > n/2 + c^/n] as 

Pr \A(x, y) > n/2 + c-Jn] = / Hyp(k;n,m,ni). 

\y\=ni 

Finally, we show that w(x) > 4s for a suitably large constant \x\ with the following claims, whose 
proofs are left to the appendix. 

Claim 17. Conditioned on \y\ < n/2 — 2^/n, we have Pr[|y| > n/2 — 2.1^/n] < |. 
Claim 18. For all d < n/2 - 2.1«/n, we have Pr[A(x 2 , y) > n/2 + d*fn\ > 0.95. 

Its easy to see from the previous two claims that w(x) > 0.95 • (2/3) > 4s. □ 

6 Concluding Remarks 

Our most important contribution here was to prove a multi-round lower bound on a fundamental problem in 
communication complexity, the Gap-Hamming Distance problem. As a consequence, we extended several 
known Q(l/£ 2 )-type space bounds for various data stream problems, such as the Distinct Elements problem, 
to multi-pass algorithms. These resolve long-standing open questions. 
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The most immediate open problem suggested by our work is to resolve Conjecture 12 It appeals that 
proving the conjecture true is going to require a technique other than round elimination, or else, an extremely 
powerful round elimination lemma that does not lose a constant fraction of the input length at each step. On 
the other hand, proving the conjecture false is also of great interest, and such a proof might extend to 
nontrivial data stream algorithms, albeit with a super-constant number of passes. 
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APPENDIX 



A Proofs of Technical Lemmas 

We begin with a proof of Claim [141 which we state here for convenience. 

Claim 19 (Restatement of Claim H4l>. For any constant c, it is possible to cover {0, 1}" with at most 
2«-o(V«iog«) Hamming balls, each with radius r = c^fn. 

Proof. We use the probabilistic method. Let r := Cy/n. For x e {0, 1}", let B x :— B(x, r) be the Hamming 
ball of radius r centered at x. For a t to be determined later, pick x\,...,x t independently and uniformly 
at random from {0, 1}". We want to show that with nonzero probability, the universe {0, 1}" is covered by 
these t Hamming balls B Xl , . . . , B Xt . 

Now, fix any x e {0, 1}" and any 1 < i < t. Since x ; was picked uniformly at random, each x is equally 
likely to be in B Xj . Therefore, 

Pr[x g B x .] = ^ > 2 e( ^^ n) - n 

where inequality stems from Fact|U 

Let BAD X = A i<i<r x ¥ be the event that x is not covered by any of the Hamming balls we picked 
at random, and let BAD — \J BAD X be the event that some x is not covered by the Hamming balls. We 
want to limit Pt[BAD]. BAD x occurs when x e" B x . for all x,. Therefore, using 1 — x < e~ x for all real x, 

Pv[BAD x ] = (l _ 2 ^iog„)-«y < e - f .*wa*o- 

By the union bound, 

Pt[BAD] < 2" Pr[BAD x ] = 2 n -^ 2H "^~ n . 

Picking t = ln2(« + i)2"- 0( -^" lo s") = 2"- (V^i°g'O ensures that Pr[B AD] < 1. Therefore, there exists a set 
of t - 2"-^i°g«) Hamming balls of radius cjh~ that cover {0, 1}". □ 

Recall that w{x) = Pr veF() [GHD(x, y) ^ GHD(0, y)]. 

Lemma 20 (Restatement of Lemma [16b. Forallx,x' g {0,1}", w(x) < w(x')ifandonlyif\x\ < \x'\, 
with equality if and only if\x\ = \x'\. 

Proof. If \x\ — \x'\, then w(x) = w(x') by symmetry. Further, note that GHD(x, y) — if and only if 
GHD(— x, y) = 1. Therefore, it suffices to handle the case where \y \ < n/2 — c^fn and GHD(0, y) = 0. 

For the rest of the proof, we assume that x, = x\, except for the nth coordinate, where x„ — and 
x' n — 1. Thus, |x| = |x'| — 1. We show that w(x) < w(x'); the rest of the lemma follows by induction. 

Let Y be the set of strings with Hamming weight |y| < n/2 — c^fn. Partition Y into the following three 
sets: 

• A := {y : |y| = n/2 + cjh~ A y n = 0}. 

• B := {y : \y \ < n/2 + c^/n A y„ = 0}. 

• C := {y : y„ = 1}. 
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Note the one-to-one correspondence between strings in B and strings in C obtained by flipping the nth bit. 
Now, consider any y e B such that y witnesses (0,.*') but not (0, x). Flipping the nth bit of y yields a 
string /eC such that Y witnesses (0, x) but not (0, x'). Hence among j e BUC there is an equal number 
of witnesses for x and x'. For any y e A, y„ = 0, whence \y — x'\ = \y — x\ + 1. Therefore, any j that 
witnesses (0, x) must also witness (0, x'), whence w(x) < w(x'). □ 

Many claims in this paper require tight upper and lower tail bounds for binomial and hypergeometric 
distributions. We use Chernoff bounds where they apply. For other bounds, we approximate using normal 
distributions. We use Feller [Fel68 ] as a reference. 

Definition 8. For x e R, let <f>(x) := e~ x2 1/2 / 4lu and 

N(x) := / </>(y)dy. 



N(x) is the cumulative distribution function of the normal distribution. We use it in Fact [3] to approxi- 
mate T(x). Here, we'll also use it to approximate tails of the binomial and hypergeometric distributions. 

Lemma 21 (Feller, Chapter VII, Lemma 2.). For all x > 0, 

1 



(jc * 3 ) 



to I T 1 < N(x) < </>(*) x 



Theorem 22 (Feller, CHapter VII, Theorem 2.). For fixed Zi,z% 

Pr[n/2 + (zi/2)JZ < \y\ < n/2 + (z 2 /2)Vn] ~ N(zi) - N(z 2 ). 
Theorem 23. For any y such that y = co(l) and y = o{n 1 ^), we have 



k>n/2+y V7T/2 X 



Claim 24 (Restatement of Claim flTb. Conditioned on \y\ < n/2 — 2^/n, 

Pr[\y\ > n/2-2.1>/n] < 1/3. 
Proof. By Theorem l22l and Lemma |2TI we have 

Pr[n/2 - 2.1 V" < Ijl <n/2- 2~Jn\ ~ iV(4) - N(4.2) 

< 0(4)/4 - ^(4.2)(4.2 _1 - 4.2" 3 ) 

< 2.0219 * 1CT 5 

By Fac© Pr[|y| > n/2 - 2^/n] < 2" 3 22 - 2 = 2" 14 = 6.1035 • 10" 5 . Putting the two terms together, we get 

2 0219 • 10" 5 

Pr[M > n/2 - 2.1 V^ll vl < n/2- 2jn\ < < 1/3. 

un ~ ' 6.1035 • 10" 5 " 

□ 

Claim 25 (Restatement of Claim [18). For aZZ<i < n/2 — 2.1 ^/n, 

Pr[A(x 2 ,y) > «/2 + 2V^] > 0.95. 
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Proof. The proof follows from the following claim, instantiated with c = 2 and a = 2.1. □ 
Claim 26. For all a > c, \x\ = y n, and all y > 1 — (1 — c/a)/4, 



r- ( 2(a -c)a 2 (l +o(l))\ 
Pr [A(*,y) >«/2 + cV^] > 1 - exp ^— f — . 

\y\=n/2-a</n \ 3a + C ) 



Proof. Let m := \x\ — yn and let n\ = n/2 — a y/n. Then, the probability that arandom y with \y\ = m 
can be expressed using the hypergeometric distribution Hyp(&; n,m,n\). Let the m set bits of x be the 
defects. The probability of k of the n { bits of y are defective is Hyp(&; n, m, n^. Note that A(x, y) = 
(m — k) + (n\ — k) = m + n\ —2k. Therefore, 

/— m + ti\ n c i— yn a+c r - 
A(x, y) > n/2 + c^n <=> k < — - -y/n. = — — V« 

We express the probability Pr|,,| =n| [A(x, y) > n/2 + c*Jn] as 

,— y n a + c ,— 
Pr [A(x, y) > n/2 + Cy /n] = Pr [K < y/n\. 

\y\=n\ K~Hyp(k;n,m,ni) 2 2 

Next, we use a concentration of measure result due to Hush and Scovel [HS05 ]. Here, we present a simplified 
version. 

Theorem 27 (Hush, Scovel). Let m = yn>n[ = n/2 — a y/n, and let fi — n / m(n — m). 

Vr[K - E[K] > n] < exp(-2^ 2 (l + o(l))). 

The expected value of a random variable K distributed according to Hyp(X; n,m,n\)Ss, 

„ r „, mn x yn /n \ yn 

E[K] = = — - - a y/n = — y ay/n. 

n n \2 / 2 

Set n := (a - c)y/n/A. Note that 

yn /— a — c _ yn a + c _ m +n\ n c _ 

£■[^1 + n = yaJn H Jn < Jn = Jn 

LJ / 2 / v 4 2 2 2 4 2 

where the inequality holds because y > 1 — (1 — c/a)/4. Note also that (1 — c/a)/4 = (a — c) /4a, so 
1 - (1 - c/a)/4 = (3a + c)/4a. By Theorem |27l 

yn a+c ^ 
Vv[K > r - — yfn\ = Pv[K -E[K]> n] 

( 2nn 2 (\ + o(l))\ 
< exp I I 

\ m(n — m) ) 

( 2(a - c) 2 (l + o{\)) \ 
= 16y(l-y) ) 

/ 2(a -c) 2 (4a) 2 (l+o(l)) \ 
" CXP V I6(a - c)(3a + c) ) 

/ 2(a -c)« 2 (l + o(l)) \ 
eXP \ 3a +c ) 

It follows that Pr[K < - 2±* ^7] > 1 - exp (- 2(a " c) 3 " 2 ) 1 c +o(1)) ) ■ □ 
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Claim 28. For any xl g {0, 1}" L , GHD(x L , y L ) is defined for at least a > e 2 ^ 2 /5 c' -fraction of y L 
{0,1}"'. 

Proof. Without loss of generality, assume x L = 0. Then, GHD(x L , y L ) is defined for all y such that \y\ 
kl/2 — c'^/h~Z or\y\ > n L /2 + c' y /n L ~- Note that for any constant x > c', 

M\y\ < ^-c'VnD > Pr[^-x>I< \y\ < ^ - C V»H 
y 2 2 2 

> N(2c) - N(2x) 



2x 

e 



(2c') 2 /2 / 1 1 \ „-2.r 2 



/J,, (2c0V 1(v'2,t 

e~ 



,-2(c0 2 



10c' 

Pr[|y | > «l/2 + c'-y/nT] is bounded in the same fashion. 
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